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DETAILED ACTION 

Response to Amendment 

1. This Office Action is made responsive to applicant's remarks received on 28 August 
2008. Claims 1-15, and 17-21 are pending. Claim 22 is new. Claims 1-5, 17-21 stand rejected. 

Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or 
described as set forth in section 102 of this title, if the differences between the subject 
matter sought to be patented and the prior art are such that the subject matter as a 
whole would have been obvious at the time the invention was made to a person having 
ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 

3. Claims 1-21 are rejected under 35 U.S.C. 103(a) as being unpatentable over Chow et al. 
(US 6, 292,589 B1 ) in combination with Whitied et al "A Software Testbed for the Development 
of 3D Raster Graphics Systems" and Fuchs et ai. "Pixel Planes 5: A Heterogeneous 
Multiprocessor Graphics System Using Processor-Enhanced Memories" ACM, Computer 
Graphics, Volume 23, no. 3, July 1989, pages 79-88. 

Regarding Claim 1: (Previously Presented) Chow discloses a method of implementing a DCT 

in a GPU ("At step 442, a Discrete Cosine Transform (DCT) is applied to the block of pixels to 
provide image enhancement, restoration, and facilitate encoding of the image." at column 28, 
line 22; "FIG. 2 is a block diagram of a computer system incorporating the present invention" at 
column 3, line 28 which incorporates a graphics controller (Figure 2, numeral 26); comprising: 



separating an image into blocks of pixels (Refer to Figure 5b; "The method includes 
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compressing, using the assigned quantization values, a macro block such that a resultant 
compressed macro block is represented by a subset of bits used to represent said macro block." 
at column 3, line 4); 

multiplying a column or row of pixels with a predetermined matrix to generate a corresponding 
set of output pixels("Similariy, the order of operations is important to developing the optimal 
solution. ..by allowing IDCT and DCT to be executed in parallel." at column 45, line 36), 
"Referring now to FIG. 31A, the above described approach to DCT and IDCT computing can be 
provided via the DCT Unit data path implementation 674, which is shown to include 4 functional 
units. The fourth unit is a multiplier unit 678." at column 45, line 41);_determining sets of 
scanlines based on the sets of output pixels ("The spider diagram may be read left to right and 
by interpreting constants above a horizontal scaling line (k1-k10) as scaling factors, and where 
two lines meet at a vertex a summation occurs." at column 45, line 23}; 

and for each set of scanlines, sampling at least a portion of the pixels comprised within the 
scanlines and pixels relative to the scanlines, and multiplying the sampled pixels with a row or 
column of the predetermined matrix ("Here, the coefficients are stored using the specific 
ordering and location in structure 720 to support transformation of the 8x8 pixel array of FIG. 
32." at column 47, line 36). 

Fuchs teaches processing each block of pixels, in parallel, within at least one shader module, 
the processing comprising ("Techniques are described for volume rendering at multiple frames 
per second, font generation directly from conic spline descriptions, and rapid calculation of 
radiosity form factors. The hardware consists of up to 32 math oriented processors, up to 16 
rendering units, and a conventional 1280x1024 pixel frame buffer, interconnected by a 5 gigabit 
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ring network. Each rendering unit consists of a 128 x 128 pixel array of processors with memory 
with parallel quadratic expression evaluation for every pixel." at abstract; further at page 81, 
Section 4-"Para!iel Rendering by Screen-Space Subdivision"): 

Whitted teaches wherein said multiplying a column or row of pixels with a predetermined matrix 
to generate a corresponding set of output pixels, determining, and sampling the pixels are 
performed by said at least one shader module. (Refer to Figure 1, at page 44). 

All these claimed elements were known methods computer designed algorithms for application 
of interactive 3D graphics, specifically my implementing a DCT in a GPU. The skilled artisan 
could have combined and/or substituted the method of processing the blocks of pixels as taught 
by Fuchs with the method of performing that processing in a shader module as taught by 
Whitted to obtain the specified claimed elements of Claim 1. The skilled artisan could have 
combined these claimed elements by these known methods and there would have been no 
change in their respective functions. 

Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. The combination 
of the disclosure of Chow in view of Whitted and Fuchs would have obtained the specified claim 
elements of Claim 1 are thereby obvious. Additionally, "shaders profoundly affect the realism 
that can be achieved in computer generated images." (Whitted, page 43, paragraph 2 under 
subsection "Design Philosophy"). 
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For clarity, the processor of Fuchs as stated at page 81, Section 4, "Parallel Rendering by 
Screen-Space Subdivision"; the author details "that the parallel processing that occurs is 
rendered on a pixel-pixel basis." (Further at pages 81, paragraph 1-6). The substitution of this 
form of processing meets the limitation of processing each block of pixels, in parallel. The 
combination of Fuchs with Whitted also meets the limitation of processing within at least one 
shader module. Therefore, the combination of these forms of processing (Fuchs and Whitted) 
and further in combination with Chow makes these claims obvious to the skilled artisan to make 
and combine to yield predictable results. 

Regarding Claim 2: (Previously presented) Chow teaches the multiplying a column or row of 
pixels with a predetermined matrix to generate a corresponding set of output pixels, 
determining, and sampling the pixels are performed in the GPU ("Referring now to FIG. 31 A, the 
above described approach to DCT and IDCT computing can be provided via the DCT Unit data 
path implementation 674, which is shown to include 4 functional units. The first is the double 
buffer operand store 646. The second and third functional units are adders 676 and 677. Each 
adder has four associated scratchpad registers 675. These registers are 2 write/2 read port 
registers. Each adder is capable of performing 2's complement addition or subtraction. The 
fourth unit is a multiplier unit 678." at column 45, Sine 41). 

Regarding Claim 3:(Original) Chow teaches each corresponding set of output pixels 
corresponds to a textured line across the pixels in the blocks of pixels (Referring to Figure 19(b) 
and Figure 21, the macroblock templates to be inputted are considered at step 464 (Figure 21), 
bi directional which resolves that the output pixels will correspond to a textured line across the 
pixels). 
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Regarding Claim 4:(Original) Chow teaches wherein sampling the pixels comprised within the 
scanlines comprises using a separate shader for each set of scanlines (Referring to Figures 
6(a)-6(c); "Referring briefly to FIGS. 6A and 6B, the motion estimation process will be described 
with reference to a series of frames 60. Each frame of the series 60 includes pixels designated 
via (x, y) coordinates,.. "As seen in FIG. 6B, motion estimation is shown to include 3 discrete 
steps; a block matching step 66, a motion vector generation step 67 and an energy calculation 
step 68. Block-matching techniques are used to identify macro blocks in the preceding (and/or 
succeeding) frames, which have the best match of pixel values to the macroblock of interest in 
the current frame. The macroblock matching procedure may be performed using a series of 
adder circuits or other methods apparent to those in the art." at column 10, line 18). 

Regarding Claims 5: (Original) Chow discloses defining an array of coordinate 
offsets to neighboring pixels, wherein the shader accesses the pixels in the scanlines using the 
offset array ("Here, the coefficients are stored using the specific ordering and location in 
structure 720 to support transformation of the 8 x 8 pixel array of FIG. 32." at column 47, line 
36). 

Regarding Claims 6:(Original) Chow teaches the same shader can be used for each 
pixel in a scanline ("The DFU is responsible for reducing the amount of video data by means of 
sub-sampling and decimation of horizontal scan lines as they arrive by optionally keeping only 
half the scan lines, either even or odd." at column 7, line 28; The hardware or circuit used to 
perform the DCT transform must be made as fast and as simple as possible. It is highly 
desirable to use the same physical logic gate for as many parts of the transform as possible, 
since to do so results in the fewest number of transistors needed to perform the operation. The 
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fewer the number of transistors used, the faster and more economical the circuit will be." at 
column 45, line 66). 

Regarding Claim 7 (Presently Presented) Chow discloses a method of processing pixels, 
comprising: separating an image into blocks of pixels (Refer to Figure 5b; "The method includes 
compressing, using the assigned quantization values, a macroblock such that a resultant 
compressed macroblock is represented by a subset of bits used to represent said macroblock." 
at column 3, line 4); 

performed by evaluating the eight 1-D row transforms, then evaluating these results through 8 
column transforms," at column 45, line 20); and 

creating a line for each row or column in each block of pixels, ("The 8x8 2-D DCT is performed 
by evaluating the eight 1-D row transforms, then evaluating these results through 8 column 
transforms." at column 45, line 20); wherein the rows or columns correspond to the polylines 
created for each column or row; ("The spider diagram may be read left to right and by 
interpreting constants above a horizontal scaling line (k1-k10) as scaling factors, and where two 
lines meet at a vertex a summation occurs." at column 45, line 23); 

Fuchs teaches processing each block of pixels, in parallel, within at least one shader module, 
("Techniques are described for volume rendering at multiple frames per second, font generation 
directly from conic spline descriptions, and rapid calculation of radiosity form factors. The 
hardware consists of up to 32 math oreiented processors, up to 16 rendering units, and a 
conventional 1280x1024 pixel frame buffer, interconnected by a 5 gigabit ring network. Each 
rendering unit consists of a 128 x 128 pixel array of processors with memory with parallel 
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quadratic expression evaluation for every pixel." at abstract; further at page 81, Section 4- 
"Parallel Rendering by Screen-Space Subdivision"): 

creating a polyline of pixels for each column or row in each block of pixels ("The 8x8 2-D DCT is 

Whitted teaches wherein said creating a polyline and creating a line are performed by said at 
least one shader module (Refer to Figure 1 , at page 44). 

All these claimed elements were known methods computer designed algorithms for application 
of interactive 3D graphics, specifically my implementing a DCT in a GPU. The skilled artisan 
could have combined and/or substituted the method of processing the blocks of pixels as taught 
by Fuchs with the method of performing that processing in a shader module as taught by 
Whitted to obtain the specified claimed elements of Claim 7. The skilled artisan could have 
combined these claimed elements by these known methods and there would have been no 
change in their respective functions. 

Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. The combination 
of the disclosure of Chow in view of Whitted and Fuchs would have obtained the specified claim 
elements of Claim 7 are thereby obvious. Additionally, "shaders profoundly affect the realism 
that can be achieved in computer generated images." (Whitted, page 43, paragraph 2 under 
subsection "Design Philosophy"). 
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For clarity, the processor of Fuchs as stated at page 81, Section 4, "Parallel Rendering by 
Screen-Space Subdivision"; the author details "that the parallel processing that occurs is 
rendered on a pixel-pixel basis." (further at pages 81, paragraph 1-6). The substitution of this 
form of processing meets the limitation of processing each block of pixels, in parallel. The 
combination of Fuchs with Whitted also meets the limitation of processing within at least one 
shader module. Therefore, the combination of these forms of processing (Fuchs and Whitted) 
and further in combination with Chow makes these claims obvious to the skilled artisan to make 
and combine to yield predictable results. 

Regarding Claim 8: (Original) Chow teaches creating a polyline of pixels for each row or 
column in each block of pixels ("The 8x8 2-D DCT is performed by evaluating the eight 1-D row 
transforms, then evaluating these results through 8 column transforms." at column 45, line 20); 
and creating a line for each column or row in each block of pixels, wherein the rows or columns 
correspond to the polylines created for each row or column (Refer to Figure 32; "FIG. 32 
illustrates a partitioning of a block of video data into left and right halves for row transforms, and 
into top and bottom halves for column transforms, for purposes of the DCT operation of FIG. 31" 
at column 4, line 52). 

Regarding Claim 9 (Original) Chow teaches determining sets of scanlines based on the lines 
created for each row or column in each block of pixels; and for each set of scanlines, sampling 
the pixels comprised within the scanlines and multiplying the sampled pixels with a row or 
column of a predetermined matrix ("Here, the coefficients are stored using the specific ordering 
and location in structure 720 to support transformation of the 8x8 pixel array of FIG. 32." at 
column 47, line 36). 
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Regarding Claim 10: (Original) Chow teaches the steps of creating are performed in a graphics 
processing unit (GPU) (Referring to Figure 28, numeral 20 (PCi Local Bus), the portion of 
Figure 28 allows the graphics controller (26) of Figure 2 to show that it is connected to the PCI 
Local bus). 

Regarding Claim 11: (Previously Presented) Chow discloses a method of processing pixels, 
comprising: separating an image into blocks of pixels (Refer to Figure 5b; "The method includes 
compressing, using the assigned quantization values, a macroblock such that a resultant 
compressed macroblock is represented by a subset of bits used to represent said macroblock," 
at column 3, line 4); 

determining a polyline of pixels for each column or row in each block of pixels ("The 8x8 2-D 
DCT is performed by evaluating the eight 1-D row transforms, then evaluating these results 
through 8 column transforms." at column 45, line 20); 

for each pixel in the polyline, sampling at least a portion of the other pixels in the corresponding 
column or row that lies along the polyline and pixels relative to the column or row ("Here, the 
coefficients are stored using the specific ordering and location in structure 720 to support 
transformation of the 8 x 8 pixel array of FIG. 32." at column 47, line 36). 
multiplying each of the other pixels by a DCT coefficient from a predetermined matrix to 
generate resultant values; and adding the resultant values together to generate a resulting value 
("Referring now to FIG. 31A, the above described approach to DCT and IDCT computing can be 
provided via the DCT Unit data path implementation 674, which is shown to include 4 functional 
units. The fourth unit is a multiplier unit 678," at column 45, line 41); 
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Whitted teaches wherein said multiplying and adding are performed by said at least one shader 
module (Refer to Figure 1, at page 44, block iabeied "Transformations..."). 

Fuchas teaches processing each block within at least one shader module: ("Techniques are 
described for volume rendering at multiple frames per second, font generation directly from 
conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists 
of up to 32 math oriented processors, up to 16 rendering units, and a conventional 1280x1024 
pixel frame buffer, interconnected by a 5 gigabit ring network. Each rendering unit consists of a 
128 x 128 pixel array of processors with memory with parallel quadratic expression evaluation 
for every pixel." at abstract; further at page 81, Section 4-"Parallel Rendering by Screen-Space 
Subdivision"): 

All these claimed elements were known methods computer designed algorithms for application 
of interactive 3D graphics, specifically my implementing a DCT in a GPU. The skilled artisan 
could have combined and/or substituted the method of processing the blocks of pixels as taught 
by Fuchs with the method of performing that processing in a shader module as taught by 
Whitted to obtain the specified claimed elements of Claim 11. The skilled artisan could have 
combined these claimed elements by these known methods and there would have been no 
change in their respective functions. 

Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. The combination 
of the disclosure of Chow in view of Whitted and Fuchs would have obtained the specified claim 
elements of Claim 1 1 are thereby obvious. Additionally, "shaders profoundly affect the realism 
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that can be achieved in computer generated images." (Whitted, page 43, paragraph 2 under 
subsection "Design Philosophy"). 

For clarity, the processor of Fuchs as stated at page 81, Section 4, "Parallel Rendering by 
Screen-Space Subdivision"; the author details "that the parallel processing that occurs is 
rendered on a pixel-pixel basis." (further at pages 81, paragraph 1-6). The substitution of this 
form of processing meets the limitation of processing each block of pixels, in parallel. The 
combination of Fuchs with Whitted also meets the limitation of processing within at least one 
shader module. Therefore, the combination of these forms of processing (Fuchs and Whitted) 
and further in combination with Chow makes these claims obvious to the skilled artisan to make 
and combine to yield predictable results. 

Regarding Claim 12: (Original) Chow teaches biasing and scaling at least one of the polyline of 
pixels, the resultant values, and each resulting value for each pixel ("Prior to writing the row or 
column results into the double buffer 646, each result must be rounded via an incrementer 681, 
which is a non-biased two's complement rounding unit." at column 45, line 51). 

Regarding Claim 13:(Previously Presented) Chow discloses a method of processing pixels 
comprising: separating an image into blocks of pixels (Refer to Figure 5b; "The method includes 
compressing, using the assigned quantization values, a macroblock such that a resultant 
compressed macroblock is represented by a subset of bits used to represent said macroblock." 
at column 3, line 4); for each column in a block of pixels, setting up a shader and rendering a 
scanline; and for each row in a block of pixels, setting up a shader and rendering a column; 
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(Referring to Figures 6(a)-6(c); "Referring briefly to FIGS. 6A and 6B, the motion estimation 
process will be described with reference to a series of frames 60. Each frame of the series 60 
includes pixels designated via (x, y) coordinates... "As seen in FIG. 6B, motion estimation is 
shown to include 3 discrete steps; a block matching step 66, a motion vector generation step 67 
and an energy calculation step 68. Block-matching techniques are used to identify macrobiocks 
in the preceding (and/or succeeding) frames, which have the best match of pixel values to the 
macrobiock of interest in the current frame. The macroblock matching procedure may be 
performed using a series of adder circuits or other methods apparent to those in the art." at 
column 10, line 18); 

Whitted teaches and wherein the setting up and the rendering are performed by said at least 
one shader module (Refer to Figure 1, at page 44, block labeled "Transformations..."). 

Fuchas teaches processing each block within at least one shader module: ("Techniques are 
described for volume rendering at multiple frames per second, font generation directly from 
conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists 
of up to 32 math oriented processors, up to 16 rendering units, and a conventional 1280x1024 
pixel frame buffer, interconnected by a 5 gigabit ring network. Each rendering unit consists of a 
128 x 128 pixel array of processors with memory with parallel quadratic expression evaluation 
for every pixel." at abstract; further at page 81, Section 4-"Para!lel Rendering by Screen-Space 
Subdivision"): 

All these claimed elements were known methods computer designed algorithms for application 
of interactive 3D graphics, specifically my implementing a DCT in a GPU. The skilled artisan 
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could have combined and/or substituted the method of processing the blocks of pixels as taught 
by Fuchs with the method of performing that processing in a shader module as taught by 
Whitted to obtain the specified claimed elements of Claim 13. The skilled artisan could have 
combined these claimed elements by these known methods and there would have been no 
change in their respective functions. 

Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. The combination 
of the disclosure of Chow in view of Whitted and Fuchs would have obtained the specified claim 
elements of Claim 13 are thereby obvious. Additionally, "shaders profoundly affect the realism 
that can be achieved in computer generated images." (Whitted, page 43, paragraph 2 under 
subsection "Design Philosophy"). 

For clarity, the processor of Fuchs as stated at page 81, Section 4, "Parallel Rendering by 
Screen-Space Subdivision"; the author details "that the parallel processing that occurs is 
rendered on a pixel-pixel basis." (further at pages 81, paragraph 1-6). The substitution of this 
form of processing meets the limitation of processing each block of pixels, in parallel. The 
combination of Fuchs with Whitted also meets the limitation of processing within at least one 
shader module. Therefore, the combination of these forms of processing (Fuchs and Whitted) 
and further in combination with Chow makes these claims obvious to the skilled artisan to make 
and combine to yield predictable results. 

Regarding Claim 14:(Currently Amended) Chow teaches setting up the shaders and the 
rendering are performed in the GPU (Referring now to FIG. 2, a computer system 10 for use 
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with the present invention is shown to include a centra! processing unit (CPU) 12,, .Also coupled 
to the PC! bus is a graphics controller 26..." at column 5, line 44). 



Regarding Claim 15: (Currently Amended) Chow discloses a system to program a GPU to 
implement a DCT, (Referring to Figure 2, numera! 26 (Graphics controller) and "At step 442, a 
Discrete Cosine Transform (DCT) is applied to the block of pixels to provide image 
enhancement, restoration, and facilitate encoding of the image." at column 26, line 22), 
comprising: adapting a processing unit to receive blocks of pixels into which an image has been 
separated, (Referring to Figure 28, numeral 20 (PCI Local Bus), the portion of Figure 28 allows 
the graphics controller (26) of Figure 2 to show that it is connected to the PCI Local bus); 
("Similarly, the order of operations is important to developing the optimal solution.,, by allowing 
I DCT and DCT to be executed in parallel" at column 45, line 36) 
multiplying a column or row of pixels of an image with a predetermined matrix to generate a 
corresponding set of output pixels("Referring now to FIG. 31A, the above described approach to 
DCT and I DCT computing can be provided via the DCT Unit data path implementation 674, 
which is shown to include 4 functional units. The fourth unit is a multiplier unit 678." at column 
45, line 41); 

determining sets of scanlines based on the sets of output pixels(Referring to Figure 33, numeral 
651 (RAM Address Wordiine; "Here, the coefficients are stored using the specific ordering and 
location in structure 720 to support transformation of the 8. times. 8 pixel array of FIG, 32." at 
column 47, line 36); and 



for each set of scanlines, sampling the pixels comprised within the scanlines and multiplying the 
sampled pixels with a row or column of the predetermined matrix (Referring to Figure 33, DCT 
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Double Buffer Addressing; Operands 0 and 7 are found stored on address line 0 in diagram 
651, with operand 0 on the left half and operand 7 on the right half, the same order as was 
found for operands 2 and 5," At column 48, line 10); 

Whitted teaches and wherein said setting up and the rendering are performed by said at least 
one shader module (Refer to Figure 1, at page 44, block labeled "Transformations,.."). 

Fuchas teaches processing each block of pixels, in parallel, by within at least one shader 
module {"Techniques are described for volume rendering at multiple frames per second, font 
generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. 
The hardware consists of up to 32 math oriented processors, up to 16 rendering units, and a 
conventional 1280x1024 pixel frame buffer, interconnected by a 5 gigabit ring network. Each 
rendering unit consists of a 128 x 128 pixel array of processors with memory with parallel 
quadratic expression evaluation for every pixel" at abstract; further at page 81, Section 4- 
"Paraliel Rendering by Screen-Space Subdivision"): 

All these claimed elements were known methods computer designed algorithms for application 
of interactive 3D graphics, specifically my implementing a DCT in a GPU. The skilled artisan 
could have combined and/or substituted the method of processing the blocks of pixels as taught 
by Fuchs with the method of performing that processing in a shader module as taught by 
Whitted to obtain the specified claimed elements of Claim 13. The skilled artisan could have 
combined these claimed elements by these known methods and there would have been no 
change in their respective functions. 
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Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. The combination 
of the disclosure of Chow in view of Whitted and Fuchs would have obtained the specified claim 
elements of Claim 13 are thereby obvious. Additionally, "shaders profoundly affect the realism 
that can be achieved in computer generated images." (Whitted, page 43, paragraph 2 under 
subsection "Design Philosophy"). 

For clarity, the processor of Fuchs as stated at page 81, Section 4, "Parallel Rendering by 
Screen-Space Subdivision"; the author details "that the parallel processing that occurs is 
rendered on a pixel-pixel basis." (further at pages 81, paragraph 1-6). The substitution of this 
form of processing meets the limitation of processing each block of pixels, in parallel. The 
combination of Fuchs with Whitted also meets the limitation of processing within at least one 
shader module. Therefore, the combination of these forms of processing (Fuchs and Whitted) 
and further in combination with Chow makes these claims obvious to the skilled artisan to make 
and combine to yield predictable results. 

Regarding Claim 16: (Canceled) 

Regarding Claim 17: (Previously presented) Chow teaches a CPU coupled to the GPU by a 
system bus, the CPU capable of separating the image into the blocks of pixels (Referring now to 
FIG. 2, a computer system 10 for use with the present invention is shown to include a central 
processing unit (CPU) 12. ..Also coupled to the PCI bus is a graphics controller 26..." at column 
5, line 44). 
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Regarding Claim 18: (Previously presented) Chow teaches the system that equally resembles 
the method of claim 3. Claim 18 standing rejected for the same reasoning's as stated at Claim 
3. 

Regarding Claim 19: (Previously presented) Chow teaches the GPU comprises a separate 
shader for sampling the pixels comprised within each set of the scanlines ("Raw, analog video 
data are received by the coior decoder 33 ...according to the CCIR601 standard at either an 
NTSC format of 720 pixels x 480 scan lines at 29.97 frames/second, or PAL format of 720 pixels 
x 576 lines at 25 frames per second." At column 6, line 18). 

Regarding Claim 20: (Original) Chow teaches the GPU defines an array of coordinate offsets 
to neighboring pixels, wherein the shader accesses the pixels in the scanlines using the offset 
array ("Here, the coefficients are stored using the specific ordering and location in structure 720 
to support transformation of the 8 x 8 pixel array of FIG. 32." at column 47, line 36). 
Regarding Claim 21: (Original) Chow teaches the same shader can be used for each pixel in a 
scanline ("The DFU is responsible for reducing the amount of video data by means of sub- 
sampling and decimation of horizontal scan lines as they arrive by optionally keeping only half 
the scan lines, either even or odd." at column 7, line 28; The hardware or circuit used to perform 
the DCT transform must be made as fast and as simple as possible, it is highly desirable to use 
the same physical logic gate for as many parts of the transform as possible, since to do so 
results in the fewest number of transistors needed to perform the operation. The fewer the 
number of transistors used, the faster and more economical the circuit will be." at column 45, 
line 66). 
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4. Claim 22 is rejected under 35 U.S.C. 103(a) as being unpatentable over Duiuk et al. (US 
6597363 B1) in combination with Whitted et al "A Software Testbed for the Development of 3D 
Raster Graphics Systems" and Fuchs et a!. "Pixel Planes 5: A Heterogeneous Multiprocessor 
Graphics System Using Processor-Enhanced Memories" ACM, Computer Graphics, Volume 23, 
no. 3, July 1989, pages 79-88. 

Regarding Claim 22: (New) Duluk teaches method of implementing low-level video processing 

in a GPU ("Embodiments of the invention may include one or more of deferred shading, a bled 
frame buffer, and multiple-stage hidden surface removal processing, as well as other structures 
and/or procedures. Embodiments of the present invention are designed to provide high- 
performance 3D graphics with Phong shading, subpixel anti-aliasing, and texture- and bump- 
mappings." at abstract), comprising: 

processing each block of pixels, in parallel, within at least one shader module (Refer to Figure 
15, numeral 3000 " Finally, if there is any Gouraud shading in the frame, the Geometry block 
calculates the vertex colors that the Fragment block uses to perform the 
shading." at column 44, line 12), the processing comprising: 

multiplying a column or row of pixels with a predetermined matrix to generate a corresponding 
set of output pixels, determining, and sampling the pixels are performed in the GPU ("Z-buffer 
rendering works well and requires no elaborate hardware. However, it typically results in a 
great deal of wasted processing effort if the scene contains many hidden surfaces. In complex 
scenes, the renderer may calculate color values for ten or twenty times as many pixels as are 
visible in the final picture. This means the computational cost of any per-pixei operation-such 
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as Phong shading or texture-mapping-is multiplied by ten or twenty. The number of surfaces 
per pixel, averaged over an entire frame, is called the depth complexity of the frame. In 
conventional z-buffered renderers, the depth complexity is a measure of the renderer's 
inefficiency when rendering a particular frame." at column 13, line 59); 

determining sets of scanlines based on the sets of output pixels ("The spider diagram may be 
read left to right and by interpreting constants above a horizontal scaling line (k1-k10) as scaling 
factors, and where two lines meet at a vertex a summation occurs." at column 45, line 23); 

and for each set of scanlines, sampling at least a portion of the pixels comprised within the 
scanlines and pixels relative to the scanlines ("Here, the coefficients are stored using the 
specific ordering and location in structure 720 to support transformation of the 8 x 8 pixel array 
of FIG. 32." at column 47, line 36). 

and multiplying the sampled pixels with a row or column of the predetermined matrix using a 
separate shader for each set of scanlines (Refer to column 1 4, line 5-1 8); 

defining an array of coordinate offsets to neighboring pixels, wherein the shader accesses the 
pixels in the scanlines using the offset array ("All of these records are accessed via pointers. 
Each primitive entry in Sort Memory contains a Color Pointer to the corresponding Color entry in 
Polygon Memory. The Color Pointer includes a Color Address, Color Offset and Color Type that 
allows us to construct a point, line, or triangle and locate the MLM pointers. The Color Address 
points to the final vertex in the primitive. Vertices are stored in order, so the vertices in a 
primitive are adjacent, except in the case of triangle fans. The Color Offset points back from the 
Color Address to the first duaioct for this vertex list. (We will refer to a point list, line strip, 
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triangle strip, or triangle fan as a vertex list.) This first dualoct contains pointers to the MLM data 
for the points, lines, strip, or fan in the vertex list. The subsequent dualocts in the vertex list 
contain Color data entries. For triangle fans, the three vertices for the triangle are at Color 
Address, (Color Address-1), and (Color Address-Color Offset +1). Note that this is not quite the 
same as the way pointers are stored in Sort memory.' at column 27, line 57); 

Whitted teaches said multiplying a column or row of pixels with a predetermined matrix to 
generate a corresponding set of output pixels, determining, and sampling the pixels are 
performed by said at least one shader module(Refer to Figure 1, at page 44). 

Chow teaches separating an image into blocks of pixels (Refer to Figure 5b; "The method 
includes compressing, using the assigned quantization values, a macro block such that a 
resultant compressed macro block is represented by a subset of bits used to represent said 
macro block." at column 3, line 4); 

Duluk, Whitted and Chow are combinable because they are in the same field of image and 
graphics processing with specific regards to pipeline processing. 

All these claimed elements were known in the prior art at the time of the invention. Similarly 
each of the methods taught by each prior art reference are methods used for computer 
designed algorithms for application of interactive 3D graphics, including but not limited to 
graphics processing. The skilled artisan could have combined and/or substituted the teachings 
of Duluk, Whitted and Chow. The combination of these teachings would have been obvious to 
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the skilled artisan. The skilled artisan could have combined these claimed elements by these 
known methods and there would have been no change in their respective functions. 

Therefore, the combination/substitution of these claimed elements would have yielded 
predictable results to one of ordinary skill in the art at the time of the invention. 

Additionally, "shaders profoundly affect the realism that can be achieved in computer generated 
images." (Whitted, page 43, paragraph 2 under subsection "Design Philosophy").At the time that 
the invention was made, it would have been obvious to one of ordinary skill in the art to combine 
the teachings of Duluk, Whitted and Chow to obtain the specified claimed elements of Claim 22. 

Response to Arguments 

5. Applicant's arguments filed 28 August 2008, have been fully considered but they are not 
persuasive. 

The claims are interpreted in light of the specification, limitations from the specification are not 
read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). 

6. With regards to the remarks at page 7 of 12, "Fuchs does not disclose at least the 
"processing within at least one shader module", however, it does not. 

The Examiner disagrees. Fuchs at page 81 for example, teaches rendering wherein, the 
algorithm represented by this prior art reference teaches low-level details with regards to 
teaching a computer system to instruct a graphics processor on show certain shapes should 
look when they are being processed in a virtual lighting environment. Similarly, at page 79, 
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Fuchs teaches "3D Graphics and Realism"-color, shading and texture, visible surface 
algorithms. The combination of the teachings is the basis for the rejection of the independent 
claims rejected above. Each of the prior art references teaches one or more elements of the 
rejection, however, those elements in combination teach the claimed elements of independent 
claims 1, 7, 11, 13 and 15. 

With regards to the argument that Chow has a deficiency, at page 8 of 12, the Examiner does 
not agree. The Examiner maintains that Chow in combination with Whitted and Fuchs teaches 
the claimed limitations of independent claims 1,7, 11, 13 and 15. With regard to "the shader 
function that performs an interpolation of intensity values for pixels processed using a z-buffer to 
determine visibility" is only a portion of the processing that Whitted describes. Whitted describes 
transformation and clipping each of which contribute to the "visibility calculations" (page 1, 
paragraph 1) of the processing of the pixels being performed. The Examiner maintains that 
although Whitted was performed before the instant invention, the combination of Chow, Whitted 
and Fuchs teaches the claimed elements. 

With regards to the remark, at page 9 of 12, that "there is no disclosure or teaching of a shader 
module within a GPU performing matrix operations such as multiplying a column or row of pixels 
with a predetermined matrix to generate a corresponding set of output pixels, determining and 
sampling the pixels" as stated at claim 1, the Examiner is unclear of the interpretation that has 
been suggested. The combination of Chow, Whitted and Fuchs teaches the matrix operations 
performed by implementing a DCT in a GPU for processing each block of pixels, within at least 
one shader module. 
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With regards to the remark, at page 9 of 12, "A rendering function, as disclosed in Fuchs, is not 
the same as a shader module." The Examiner disagrees. At Figure 3, page 82, 5.3 Renderer, 
Fuchs teaches "quadratic expressions while not essential for polygon rendering, are very useful 
for rendering curved surfaces and for computing a spherical radiosity lighting model (see section 
7.6). Fuchs more than sufficiently teaches processing blocks of pixels in parallel within at least 
one shader module. 

Conclusion 

7. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as 
set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the mailing date 
of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Mia M. Thomas whose telephone number is (571)270-1583. The 
examiner can normally be reached on Monday-Thursday 8am-5pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Vikkram Bali can be reached on 571-272-7415. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private 
PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you 
would like assistance from a USPTO Customer Service Representative or access to the 
automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

- i M Thomas/ 
Examiner, Art Unit 2624 



A/ikkram Bali/ 

Supervisory Patent Examiner, Art Unit 2624 



